Introduction

ShinyBaseball is an R package containing several Shiny apps to illustrate Statcast and Retrosheet baseball data. The package can be used to understand how pitch outcomes vary by pitch type and location for a specific pitcher or a specific hitter. One can see how the location of pitches depend on the pitch type and the count. Also, by visualizing the location of in-play events, one can see the hot and cold regions for specific hitters. Statcast data from the 2019 season is included with the package.

This document provides an overview with snapshots of the Shiny functions available in ShinyBaseball version 0.4.8.

Installation

This package depends on the following packages that should be installed first.

shiny, ggplot2, dplyr, stringr, tidyr, lubridate, ggrepel

To install the ShinyBaseball package, use the install_github() function from the remotes package:

library(remotes)
install_github("bayesball/ShinyBaseball")

Learning Shiny Apps

There are several apps demonstrating some interactive graphics features of Shiny.

PointBrush()

This app illustrates scatterplot brushing. One can select X and Y variables to graph. One selects a region of interest and the app will display the names, x and y variables for the points in the region.

PointBrush()

PointClick() and PointHover()

These apps illustrating clicking and hovering capabilities of Shiny graphics. Again one selects variables to graph. By clicking or hovering near a point, the app will display the name and X and Y variables for that point.

PointClick()

FourMeasures() - Brushing Illustration

This is useful for exploring relationships between four variables in the FanGraphs batting leaderboard. One selects X1, Y1, X2, Y2 variables to graph. One sees two scatterplots. One can brush either scatterplot - the corresponding points in the other scatterplot are colored red. Also the app displays the names and variables corresponding to the points in the brushed region.

FourMeasures()

PitchOutcome() - Visualizing Pitch Outcomes

This app is helpful for visualizing pitch outcomes for any pitcher or batter of interest.

We indicate by use of the Player Type button if we want to look at pitching or batting data. Then we enter in the Player Name and the Pitch Type of interest.

Here we indicate that we wish to look at a Pitcher, enter in Jacob DeGrom’s name, and select “FF” from the Pitch Type pallet.

We see the locations of All of Jacob DeGrom’s four-seamers where the color of the point corresponds to the Type variable (ball, strike, or inplay).

PitchOutcome()

By selecting “Called” from the Pitches to Display pallet, we see the locations of all Called pitches where the color corresponds to called ball or called strike.

By selecting “Swing” from the Pitches to Display pallet, we see the locations of all swung pitches – the color of the point corresponds to the outcome (foul, inplay or miss). One can brush over this scatterplot – the app displays the number of swung pitches and miss rate for the points in the brushed region.

By selecting “In-Play” from the Pitches to Display pallet, we see the locations of all pitches put in-play – the color of the point corresponds to the outcome (hit or out). One can brush over this scatterplot – the app displays the number of pitches in-play and hit rate for the points in the brushed region.

PitchTypeCount() - Pitch Locations by Type and Count

This app is helpful for comparing pitch locations across pitch types and counts.

We enter in the Pitcher Name (here Jacob DeGrom). By selecting “FF” and “SL” from the Pitch Type pallet and “0-0” from the Count pallet, we can compare locations of four-seamers and sliders on a 0-0 count.

PitchTypeCount()

By selecting “FF” and “SL” from the Pitch Type pallet and “2-0”, “1-1” and “0-2” from the Count pallet, we can compare locations of four-seamers and sliders on these three “2 pitch” counts.

BrushingZone() - Visualizing In-Play Outcomes

This app is a general function for plotting and brushing different measures on balls put in-play over the zone.

A live version of BrushingZone() can be found on the RStudio Shiny Server:

https://bayesball.shinyapps.io/BrushingZone/

We begin by entering in a batter’s name – here we enter Bryce Harper.

We see the locations of all pitches put in play where the color of the point corresponds to the launch speed.

BrushingZone()

If one clicks on an individual point, you will see the launch speed, launch angle and expected batting average for that ball put into play.

If one brushes over this plot, the app displays number of balls in play (BIP), the number of hits (H) and home runs (HR), and the average values of launch speed, hit rate, home run rate and expected BA for points in the brushed region.

If one selects Home Run, the points are colored by the outcome (home run or not).

This app can also be used to show locations of hits or expected batting average over the zone.

SprayChart() - Visualizing Locations of In-Play Events

This app shows the spatial locations of all balls put in in play for a particular hitter.

NOTE: The graphs are constructed so that the “pull” direction is always the left-side of the display. This will make it easier to compare hitters of different sides.

You enter in the name of a hitter – here we enter Mike Trout. If the Batted Ball Type is selected “All”, one sees the locations of all balls in play where the color corresponds to the batted ball type. A table at the bottom gives the frequency distribution of the batted ball type. Also the subtitle shows the balls-in-play (BIP) hit rate.

SprayChart()

If one selects “Fly ball”, “Ground ball”, “Line drive”, or “Pop up”, one sees the locations of all batted balls of that type. The color of the point corresponds to the batted ball outcome (Hit or Out).

Below we see the locations of Trout’s flyballs. Note that his hit rate on flyballs is 0.348.

SprayCompare() - Compute In-Play Locations for Two Batters

Using this app, one can compare the locations of batted balls for two hitters. One enters in the names of two batters – here we are comparing Mike Trout and George Springer. With Batted Ball Type selected as “All”, one sees the locations of all batted balls where the color corresponds to the batted ball type. At the bottom, a table of the frequencies of batted balls types is displayed for both batters.

SprayCompare()

If one selects “Fly ball”, “Ground ball”, “Line drive”, or “Pop up”, one sees the locations of all batted balls of that type for both batters.

PitcherFourSeam() - Visualizing Rates of Four-Seam Fastballs Over the Zone for One Pitcher

This app shows values of different rate statistics for four-seam fastballs computed over regions of the strike zone for a specific pitcher.

One inputs the name of a pitcher, specific Statcast seasons to include, and the type of rate desired. There are five types of rates:

  • location – the percentage of four-seamers that fall in each region of the zone
  • swing – the percentage of four-seamers that are swung at for each region
  • miss – the percentage of four-seams missed on swings for each region
  • hit – the batting average on balls put into play on four-seamers for each region
  • HR – the home run percentage on balls put into play on four-seamers for each region

For example, by choosing the Rates tab, here are the rates of Jacob deGrom’s four-seamers over the zone

If one chooses the Residuals tab, one computes the difference between deGrom’s location rates and the overall location rates for that period. We see that deGrom is more likely to throw greater percentages of four-seamers high in the zone to left-handed hitters.

By choosing the Z Scores tab, we assess the difference in rates by use of a Z statistic. Values larger than 2 in absolute value are meaningful. We see that deGrom indeed throws a greater fraction of four-seamers high to the zone to left-handers since the Z-scores mostly exceed 2.

One can look at other rates. For example, here are the miss rates of deGrom. Right-handed hitters are pretty likely to miss a deGrom fastball high in the zone.

BatterFourSeam() - Visualizing Rates of Four-Seam Fastballs Over the Zone for One Batter

This app shows values of different rate statistics for four-seam fastballs computed over regions of the strike zone for a specific batter. The design of this app is very similar to the same app for a pitcher.

One inputs the name of a batter, specific Statcast seasons to include, and the type of rate desired.

As an example, here is a display of Mike Trout’s in-play batting average on four-seamers thrown by right and left-handed pitchers.

Here is a display of Mike Trout’s in-play home run percentages on four-seamers thrown by right and left-handed pitchers.

RadialChart() - A Radial Chart of Balls in Play for a Pitcher in a Specific Game

This illustrates the use of a Baseball Savant Radial Chart to show the launch angle and exit velocity of balls put into play.

To use this app, one types the name of a starting pitcher and date that he pitched during the 2019 season. All of the possible starting dates for a given pitcher are listed to make it easier to input the date.

Her is a display of the launch variables of balls put into play for Aaron Nola during the game that he pitched on March 28, 2019.

PredictingBattingRates() - Illustrating the Benefits of Multilevel Modeling

This app illustrates the usefulness of a multilevel model in predicting hitting rates.

The inputs are chosen on the left-hand side of the app. One selects a date during the 2019 season. One trains the model using hitting data up to that date, and predicts rates for hitting data after that date. One decides on the type of rate (H, SO or HR), the minimum number of AB for batters in the training dataset, and whether or not you wish to exclude pitchers batting from the dataset.

By selecting the Rates tab, one sees a parallel dotplot display of the observed rates and the predictions using the multilevel model. The bottom of the screen shows the sum of squared errors of the observed rates and the multilevel model predictions.

By selecting the Talents tab, one sees the estimated talent curve for the rates.

PredictingBattingRatesPA() - Illustrating the Benefits of Multilevel Modeling for PA data

This app is very similar to the PredictionBattingRates() function. The only difference is that one is looking at rates per plate appearances instead of per at-bats.

PredictiveMaxOfer() - Predictive Checking of a Coin-Flipping Model

This app illustrates predictive checking of a basic model for hitting.

Assume the individual hit outcomes follow a coin-flipping probability of success p. Assume p has a Beta distribution with shape parameters a and b.

The ofers are the at-bats between successes in the binary hit outcomes. Interested in the predictive distribution of the maximum length of an ofer or the sum of squared ofer lengths among the Bernoulli outcomes.

One selects 90% bounds for the hit probability p. This indirectly selects the shape parameters of the Beta prior. One selects the number of at-bats, the streaky measure of interest, and the observed value of the measure.

One sees a histogram of the streaky measure for 500 simulations of the experiment. If selected, the histogram will also display with a vertical line the observed value and output the tail probability.

PredictiveHotHand() - Predictive Checking of a Markov Switching Model

This app illustrates predictive checking of a streaky measure for a Markov Switching model.

Assume the individual at-bat outcomes are independent Bernoulli with a specific hit probability. For each game, the batter is either in a hot state with hitting probability pH or a cold state with hitting probability pC. The batter moves between the hot and cold states across games by a Markov Chain with staying probability rho. The probabilities pC and pH have independent beta priors.

The ofers are the at-bats between successes in the binary sequence. Interested in the predictive distribution of the maximum length of an ofer or the sum of squared ofer lengths among the Bernoulli outcomes.

One selects beta priors for the two probabilities by specifying limits of 90% bounds, and selects values of the staying probability rho. One selects a 2019 player of interest and the streaky measure to consider.

One sees a histogram of the streaky measure for 500 simulations of the experiment. The histogram also displays with a vertical line the observed value and output the tail probability.

LogitHomeRunRates() - Comparing Home Run Rates for Two Seasons

This app illustrates comparison of two seasons of home run hitting. All of the data is accessed from the author’s Github site.

One decides on the two seasons to compare and the number of groups for categorization of the launch angle and launch speed values. We consider the “home-run friendly” launch angles between 20 and 40 degrees and launch speeds between 95 and 110 mpg.

The range of launch values are divided into subregions. The top graph shows the difference of the logits (season2 minus season1) of the rates of getting batted balls for each subregion. The bottom graph shows the difference in logits of the home run rates for each subregion.

By pressing the Download button, one can download all of the data used to create the two graphs.

HomeRunPaths() - Shiny app to compare home run paths of a selection of home run leaders

This app provides a set of graphs for comparing the home run paths of a selection of sluggers from MLB history. The hitters are the top 30 leaders in career MLB home runs.

One starts by choosing a small set of players from the input pallete. Here we are choosing Hank Aaron, Reggie Jackson and Gary Sheffield.

The Home Run Paths tab displays the total home run count for each player graphed against age in years.

The Fitted Slopes tab displays a scatterplot of the home run totals and the average count of home runs per year for the selected players.

The Residuals from Fit tab displays smoothed residuals Actual HR Total Minus Fitted (Staight Line) HR as a function of age for the selected players. This allows us to see how a player’s home run path deviates from a straight line. Here both Jackson and Sheffield both show a tendency to hit a higher than average rate of home runs in their middle 30’s.

Datasets in Package

All of the data for using these apps is included as part of the ShinyBaseball package.

chadwick

This provides the Statcast ids for all Major League Players.

FF_15_20

This dataset provides Statcast data on four-seam fastballs for the seasons 2015 through 2020.

fg2020batting

This dataset contains stats for the FanGraphs leaders for the 2020 season. This data is used for the FourMeasures(), PointBrush(), PointClick(), PointHover() Shiny apps.

game_info

This dataset contains the game ids for a large number of MLB games.

retro2019

This dataset contains event data for all 191,973 plate appearances for the 2019 season.

sc_pitcher_2019

This dataset provides Statcast data for the 732,473 pitches in the 2019 season. This is data is used for the PitchOutcome() and PitchTypeCount() Shiny apps.

sc2019_ip_radial

This dataset contains launch angle, launch speed, estimated ba, and event data for all balls in play for 2019 season.

sc2019_ip

This dataset provides Statcast data for the 125,751 balls put in play for the 2019 season. This data is used for the BrushingZone(), SprayChart() and SprayCompare() Shiny apps.

top30homerun

This data from Baseball Reference gives data about each home run hit by the top-30 career hitters in MLB history.